| Hector R. Gavilanes | Chief Information Officer |
| Gail Han | Chief Operating Officer |
| Michael T. Mezzano | Chief Technology Officer |
University of West Florida
November 2023
The prcomp() function performs principal component analysis on a dataset using the singular value decomposition method with the covariance matrix of the data.
Driven by multicollinearity.
Features less significant in explaining variability.
All variables are numeric
Categorical Index variable.
34 missing values.
Imputation of missing values using the \(Mean\) (\(\mu\))
Mean (\(\mu\)=0); Standard Deviation (\(\sigma\)= 1)
\[ Z = \frac{{ x - \mu }}{{ \sigma }} \]
\[ Z \sim N(0,1) \]
3 Outliers
No leverage
Minimal difference.
No observations removed.
# reproducible random sampling
set.seed(my_seed)
# Create Target y-variable for the training set
y <- train_data$expected_survival
# Split the data into training and test sets
split <- sample.split(y, SplitRatio = 0.7)
training_set <- subset(train_data, split == TRUE)
test_set <- subset(train_data, split == FALSE) # Perform Principal Component Analysis (PCA) preprocessing on the training data
pca <- preProcess(training_set[, -target_index],
method = 'pca', pcaComp = 8)
# Apply PCA transformation to original training set
training_set <- predict(pca, training_set)
# Reorder columns, moving the dependent feature index to the end
training_set <- training_set[c(2:9, 1)]
# Apply PCA transformation to original test set
test_set <- predict(pca, test_set)
# Reorder columns, moving the dependent feature index to the end
test_set <- test_set[c(2:9, 1)]8 Principal Components